<!DOCTYPE html>

<html lang="en">
  <head>
    <meta charset="utf-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="generator" content="Docutils 0.19: https://docutils.sourceforge.io/" />

    <meta charset="utf-8">
    <meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
    <meta http-equiv="x-ua-compatible" content="ie=edge">
    
    <title>2.3.1. 深化用户兴趣表示 &#8212; FunRec 推荐系统 0.0.1 documentation</title>

    <link rel="stylesheet" href="../../_static/material-design-lite-1.3.0/material.blue-deep_orange.min.css" type="text/css" />
    <link rel="stylesheet" href="../../_static/sphinx_materialdesign_theme.css" type="text/css" />
    <link rel="stylesheet" href="../../_static/fontawesome/all.css" type="text/css" />
    <link rel="stylesheet" href="../../_static/fonts.css" type="text/css" />
    <link rel="stylesheet" type="text/css" href="../../_static/pygments.css" />
    <link rel="stylesheet" type="text/css" href="../../_static/basic.css" />
    <link rel="stylesheet" type="text/css" href="../../_static/d2l.css" />
    <script data-url_root="../../" id="documentation_options" src="../../_static/documentation_options.js"></script>
    <script src="../../_static/jquery.js"></script>
    <script src="../../_static/underscore.js"></script>
    <script src="../../_static/_sphinx_javascript_frameworks_compat.js"></script>
    <script src="../../_static/doctools.js"></script>
    <script src="../../_static/sphinx_highlight.js"></script>
    <script src="../../_static/d2l.js"></script>
    <script async="async" src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
    <link rel="index" title="Index" href="../../genindex.html" />
    <link rel="search" title="Search" href="../../search.html" />
    <link rel="next" title="2.3.2. 生成式召回方法" href="2.generateive_recall.html" />
    <link rel="prev" title="2.3. 序列召回" href="index.html" /> 
  </head>
<body>
    <div class="mdl-layout mdl-js-layout mdl-layout--fixed-header mdl-layout--fixed-drawer"><header class="mdl-layout__header mdl-layout__header--waterfall ">
    <div class="mdl-layout__header-row">
        
        <nav class="mdl-navigation breadcrumb">
            <a class="mdl-navigation__link" href="../index.html"><span class="section-number">2. </span>召回模型</a><i class="material-icons">navigate_next</i>
            <a class="mdl-navigation__link" href="index.html"><span class="section-number">2.3. </span>序列召回</a><i class="material-icons">navigate_next</i>
            <a class="mdl-navigation__link is-active"><span class="section-number">2.3.1. </span>深化用户兴趣表示</a>
        </nav>
        <div class="mdl-layout-spacer"></div>
        <nav class="mdl-navigation">
        
<form class="form-inline pull-sm-right" action="../../search.html" method="get">
      <div class="mdl-textfield mdl-js-textfield mdl-textfield--expandable mdl-textfield--floating-label mdl-textfield--align-right">
        <label id="quick-search-icon" class="mdl-button mdl-js-button mdl-button--icon"  for="waterfall-exp">
          <i class="material-icons">search</i>
        </label>
        <div class="mdl-textfield__expandable-holder">
          <input class="mdl-textfield__input" type="text" name="q"  id="waterfall-exp" placeholder="Search" />
          <input type="hidden" name="check_keywords" value="yes" />
          <input type="hidden" name="area" value="default" />
        </div>
      </div>
      <div class="mdl-tooltip" data-mdl-for="quick-search-icon">
      Quick search
      </div>
</form>
        
<a id="button-show-source"
    class="mdl-button mdl-js-button mdl-button--icon"
    href="../../_sources/chapter_1_retrieval/3.sequence/1.user_interests.rst.txt" rel="nofollow">
  <i class="material-icons">code</i>
</a>
<div class="mdl-tooltip" data-mdl-for="button-show-source">
Show Source
</div>
        </nav>
    </div>
    <div class="mdl-layout__header-row header-links">
      <div class="mdl-layout-spacer"></div>
      <nav class="mdl-navigation">
          
              <a  class="mdl-navigation__link" href="https://funrec-notebooks.s3.eu-west-3.amazonaws.com/fun-rec.zip">
                  <i class="fas fa-download"></i>
                  Jupyter 记事本
              </a>
          
              <a  class="mdl-navigation__link" href="https://github.com/datawhalechina/fun-rec">
                  <i class="fab fa-github"></i>
                  GitHub
              </a>
      </nav>
    </div>
</header><header class="mdl-layout__drawer">
    
          <!-- Title -->
      <span class="mdl-layout-title">
          <a class="title" href="../../index.html">
              <span class="title-text">
                  FunRec 推荐系统
              </span>
          </a>
      </span>
    
    
      <div class="globaltoc">
        <span class="mdl-layout-title toc">Table Of Contents</span>
        
        
            
            <nav class="mdl-navigation">
                <ul>
<li class="toctree-l1"><a class="reference internal" href="../../chapter_preface/index.html">前言</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../chapter_installation/index.html">安装</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../chapter_notation/index.html">符号</a></li>
</ul>
<ul class="current">
<li class="toctree-l1"><a class="reference internal" href="../../chapter_0_introduction/index.html">1. 推荐系统概述</a><ul>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_0_introduction/1.intro.html">1.1. 推荐系统是什么？</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_0_introduction/2.outline.html">1.2. 本书概览</a></li>
</ul>
</li>
<li class="toctree-l1 current"><a class="reference internal" href="../index.html">2. 召回模型</a><ul class="current">
<li class="toctree-l2"><a class="reference internal" href="../1.cf/index.html">2.1. 协同过滤</a><ul>
<li class="toctree-l3"><a class="reference internal" href="../1.cf/1.itemcf.html">2.1.1. 基于物品的协同过滤</a></li>
<li class="toctree-l3"><a class="reference internal" href="../1.cf/2.usercf.html">2.1.2. 基于用户的协同过滤</a></li>
<li class="toctree-l3"><a class="reference internal" href="../1.cf/3.mf.html">2.1.3. 矩阵分解</a></li>
<li class="toctree-l3"><a class="reference internal" href="../1.cf/4.summary.html">2.1.4. 总结</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="../2.embedding/index.html">2.2. 向量召回</a><ul>
<li class="toctree-l3"><a class="reference internal" href="../2.embedding/1.i2i.html">2.2.1. I2I召回</a></li>
<li class="toctree-l3"><a class="reference internal" href="../2.embedding/2.u2i.html">2.2.2. U2I召回</a></li>
<li class="toctree-l3"><a class="reference internal" href="../2.embedding/3.summary.html">2.2.3. 总结</a></li>
</ul>
</li>
<li class="toctree-l2 current"><a class="reference internal" href="index.html">2.3. 序列召回</a><ul class="current">
<li class="toctree-l3 current"><a class="current reference internal" href="#">2.3.1. 深化用户兴趣表示</a></li>
<li class="toctree-l3"><a class="reference internal" href="2.generateive_recall.html">2.3.2. 生成式召回方法</a></li>
<li class="toctree-l3"><a class="reference internal" href="3.summary.html">2.3.3. 总结</a></li>
</ul>
</li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../../chapter_2_ranking/index.html">3. 精排模型</a><ul>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_2_ranking/1.wide_and_deep.html">3.1. 记忆与泛化</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_2_ranking/2.feature_crossing/index.html">3.2. 特征交叉</a><ul>
<li class="toctree-l3"><a class="reference internal" href="../../chapter_2_ranking/2.feature_crossing/1.second_order.html">3.2.1. 二阶特征交叉</a></li>
<li class="toctree-l3"><a class="reference internal" href="../../chapter_2_ranking/2.feature_crossing/2.higher_order.html">3.2.2. 高阶特征交叉</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_2_ranking/3.sequence.html">3.3. 序列建模</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_2_ranking/4.multi_objective/index.html">3.4. 多目标建模</a><ul>
<li class="toctree-l3"><a class="reference internal" href="../../chapter_2_ranking/4.multi_objective/1.arch.html">3.4.1. 基础结构演进</a></li>
<li class="toctree-l3"><a class="reference internal" href="../../chapter_2_ranking/4.multi_objective/2.dependency_modeling.html">3.4.2. 任务依赖建模</a></li>
<li class="toctree-l3"><a class="reference internal" href="../../chapter_2_ranking/4.multi_objective/3.multi_loss_optim.html">3.4.3. 多目标损失融合</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_2_ranking/5.multi_scenario/index.html">3.5. 多场景建模</a><ul>
<li class="toctree-l3"><a class="reference internal" href="../../chapter_2_ranking/5.multi_scenario/1.multi_tower.html">3.5.1. 多塔结构</a></li>
<li class="toctree-l3"><a class="reference internal" href="../../chapter_2_ranking/5.multi_scenario/2.dynamic_weight.html">3.5.2. 动态权重建模</a></li>
</ul>
</li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../../chapter_3_rerank/index.html">4. 重排模型</a><ul>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_3_rerank/1.greedy.html">4.1. 基于贪心的重排</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_3_rerank/2.personalized.html">4.2. 基于个性化的重排</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_3_rerank/3.summary.html">4.3. 本章小结</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../../chapter_4_trends/index.html">5. 难点及热点研究</a><ul>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_4_trends/1.debias.html">5.1. 模型去偏</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_4_trends/2.cold_start.html">5.2. 冷启动问题</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_4_trends/3.generative.html">5.3. 生成式推荐</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_4_trends/4.summary.html">5.4. 本章小结</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../../chapter_5_projects/index.html">6. 项目实践</a><ul>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_5_projects/1.understanding.html">6.1. 赛题理解</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_5_projects/2.baseline.html">6.2. Baseline</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_5_projects/3.analysis.html">6.3. 数据分析</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_5_projects/4.recall.html">6.4. 多路召回</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_5_projects/5.feature_engineering.html">6.5. 特征工程</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_5_projects/6.ranking.html">6.6. 排序模型</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../../chapter_appendix/index.html">7. Appendix</a><ul>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_appendix/word2vec.html">7.1. Word2vec</a></li>
</ul>
</li>
</ul>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../../chapter_references/references.html">参考文献</a></li>
</ul>

            </nav>
        
        </div>
    
</header>
        <main class="mdl-layout__content" tabIndex="0">

	<script type="text/javascript" src="../../_static/sphinx_materialdesign_theme.js "></script>
    <header class="mdl-layout__drawer">
    
          <!-- Title -->
      <span class="mdl-layout-title">
          <a class="title" href="../../index.html">
              <span class="title-text">
                  FunRec 推荐系统
              </span>
          </a>
      </span>
    
    
      <div class="globaltoc">
        <span class="mdl-layout-title toc">Table Of Contents</span>
        
        
            
            <nav class="mdl-navigation">
                <ul>
<li class="toctree-l1"><a class="reference internal" href="../../chapter_preface/index.html">前言</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../chapter_installation/index.html">安装</a></li>
<li class="toctree-l1"><a class="reference internal" href="../../chapter_notation/index.html">符号</a></li>
</ul>
<ul class="current">
<li class="toctree-l1"><a class="reference internal" href="../../chapter_0_introduction/index.html">1. 推荐系统概述</a><ul>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_0_introduction/1.intro.html">1.1. 推荐系统是什么？</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_0_introduction/2.outline.html">1.2. 本书概览</a></li>
</ul>
</li>
<li class="toctree-l1 current"><a class="reference internal" href="../index.html">2. 召回模型</a><ul class="current">
<li class="toctree-l2"><a class="reference internal" href="../1.cf/index.html">2.1. 协同过滤</a><ul>
<li class="toctree-l3"><a class="reference internal" href="../1.cf/1.itemcf.html">2.1.1. 基于物品的协同过滤</a></li>
<li class="toctree-l3"><a class="reference internal" href="../1.cf/2.usercf.html">2.1.2. 基于用户的协同过滤</a></li>
<li class="toctree-l3"><a class="reference internal" href="../1.cf/3.mf.html">2.1.3. 矩阵分解</a></li>
<li class="toctree-l3"><a class="reference internal" href="../1.cf/4.summary.html">2.1.4. 总结</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="../2.embedding/index.html">2.2. 向量召回</a><ul>
<li class="toctree-l3"><a class="reference internal" href="../2.embedding/1.i2i.html">2.2.1. I2I召回</a></li>
<li class="toctree-l3"><a class="reference internal" href="../2.embedding/2.u2i.html">2.2.2. U2I召回</a></li>
<li class="toctree-l3"><a class="reference internal" href="../2.embedding/3.summary.html">2.2.3. 总结</a></li>
</ul>
</li>
<li class="toctree-l2 current"><a class="reference internal" href="index.html">2.3. 序列召回</a><ul class="current">
<li class="toctree-l3 current"><a class="current reference internal" href="#">2.3.1. 深化用户兴趣表示</a></li>
<li class="toctree-l3"><a class="reference internal" href="2.generateive_recall.html">2.3.2. 生成式召回方法</a></li>
<li class="toctree-l3"><a class="reference internal" href="3.summary.html">2.3.3. 总结</a></li>
</ul>
</li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../../chapter_2_ranking/index.html">3. 精排模型</a><ul>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_2_ranking/1.wide_and_deep.html">3.1. 记忆与泛化</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_2_ranking/2.feature_crossing/index.html">3.2. 特征交叉</a><ul>
<li class="toctree-l3"><a class="reference internal" href="../../chapter_2_ranking/2.feature_crossing/1.second_order.html">3.2.1. 二阶特征交叉</a></li>
<li class="toctree-l3"><a class="reference internal" href="../../chapter_2_ranking/2.feature_crossing/2.higher_order.html">3.2.2. 高阶特征交叉</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_2_ranking/3.sequence.html">3.3. 序列建模</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_2_ranking/4.multi_objective/index.html">3.4. 多目标建模</a><ul>
<li class="toctree-l3"><a class="reference internal" href="../../chapter_2_ranking/4.multi_objective/1.arch.html">3.4.1. 基础结构演进</a></li>
<li class="toctree-l3"><a class="reference internal" href="../../chapter_2_ranking/4.multi_objective/2.dependency_modeling.html">3.4.2. 任务依赖建模</a></li>
<li class="toctree-l3"><a class="reference internal" href="../../chapter_2_ranking/4.multi_objective/3.multi_loss_optim.html">3.4.3. 多目标损失融合</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_2_ranking/5.multi_scenario/index.html">3.5. 多场景建模</a><ul>
<li class="toctree-l3"><a class="reference internal" href="../../chapter_2_ranking/5.multi_scenario/1.multi_tower.html">3.5.1. 多塔结构</a></li>
<li class="toctree-l3"><a class="reference internal" href="../../chapter_2_ranking/5.multi_scenario/2.dynamic_weight.html">3.5.2. 动态权重建模</a></li>
</ul>
</li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../../chapter_3_rerank/index.html">4. 重排模型</a><ul>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_3_rerank/1.greedy.html">4.1. 基于贪心的重排</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_3_rerank/2.personalized.html">4.2. 基于个性化的重排</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_3_rerank/3.summary.html">4.3. 本章小结</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../../chapter_4_trends/index.html">5. 难点及热点研究</a><ul>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_4_trends/1.debias.html">5.1. 模型去偏</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_4_trends/2.cold_start.html">5.2. 冷启动问题</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_4_trends/3.generative.html">5.3. 生成式推荐</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_4_trends/4.summary.html">5.4. 本章小结</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../../chapter_5_projects/index.html">6. 项目实践</a><ul>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_5_projects/1.understanding.html">6.1. 赛题理解</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_5_projects/2.baseline.html">6.2. Baseline</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_5_projects/3.analysis.html">6.3. 数据分析</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_5_projects/4.recall.html">6.4. 多路召回</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_5_projects/5.feature_engineering.html">6.5. 特征工程</a></li>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_5_projects/6.ranking.html">6.6. 排序模型</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../../chapter_appendix/index.html">7. Appendix</a><ul>
<li class="toctree-l2"><a class="reference internal" href="../../chapter_appendix/word2vec.html">7.1. Word2vec</a></li>
</ul>
</li>
</ul>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../../chapter_references/references.html">参考文献</a></li>
</ul>

            </nav>
        
        </div>
    
</header>

    <div class="document">
        <div class="page-content" role="main">
        
  <section id="user-interests">
<span id="id1"></span><h1><span class="section-number">2.3.1. </span>深化用户兴趣表示<a class="headerlink" href="#user-interests" title="Permalink to this heading">¶</a></h1>
<p>传统的向量召回方法，如双塔模型，倾向于将用户所有的历史行为“压扁”成一个单一的静态向量。这种“平均化”的处理方式虽然高效，但在两个关键方面存在明显局限：首先，它无法表达用户兴趣的多样性，例如一个用户可能既是需要购买专业书籍的程序员，又是一位需要选购婴儿用品的新手父亲，单一向量很难同时兼顾这两种截然不同的需求；其次，它忽略了兴趣的时效性，无法区分用户长期稳定的爱好（如对摄影的持续关注）和临时的即时需求（如今天突然搜索“感冒药”），而后者往往更能预示下一次的交互行为。</p>
<p>为了构建一个更丰富、更立体的用户画像以实现更精准的召回，研究者们沿着“深化用户兴趣表示”的路径进行了持续探索。本章将介绍其中的两个代表性模型：MIND
和
SDM。我们将首先探讨如何使用多个向量来表示用户的多元兴趣，然后在此基础上，进一步融入时间维度，学习如何动态地捕捉用户兴趣的演化。</p>
<section id="mind">
<h2><span class="section-number">2.3.1.1. </span>MIND：用多个向量捕捉用户的多元兴趣<a class="headerlink" href="#mind" title="Permalink to this heading">¶</a></h2>
<p>想象一下，你在淘宝上的购物历史：今天买了一本编程书，昨天买了运动鞋，上周买了咖啡豆。如果推荐系统只用一个数字向量来描述你，就像是用一个标签来概括一个人的全部——显然是不够的。</p>
<p>MIND (Multi-Interest Network with Dynamic Routing) <span id="id2">(<a class="reference internal" href="../../chapter_references/references.html#id36" title="Li, C., Liu, Z., Wu, M., Xu, Y., Zhao, H., Huang, P., … Lee, D. L. (2019). Multi-interest network with dynamic routing for recommendation at tmall. Proceedings of the 28th ACM international conference on information and knowledge management (pp. 2615–2623).">Li <em>et al.</em>, 2019</a>)</span>
模型提出了一个更符合直觉的想法：既然用户有多种兴趣，为什么不用多个向量来分别表示呢？就像给每个兴趣爱好都分配一个专门的“代言人”。</p>
<p>这个模型的巧妙之处在于借鉴了胶囊网络的动态路由思想。简单来说，它会自动把你的行为按照兴趣类型进行分组——编程相关的行为归为一类，运动相关的归为另一类，美食相关的又是一类。每一类都会生成一个专门的兴趣向量，这样推荐系统就能更精准地理解你在不同场景下的需求。</p>
<figure class="align-default" id="id5">
<span id="mind-model-architecture"></span><a class="reference internal image-reference" href="../../_images/mind_model_architecture.png"><img alt="../../_images/mind_model_architecture.png" src="../../_images/mind_model_architecture.png" style="width: 600px;" /></a>
<figcaption>
<p><span class="caption-number">图2.3.1 </span><span class="caption-text">MIND模型整体结构</span><a class="headerlink" href="#id5" title="Permalink to this image">¶</a></p>
</figcaption>
</figure>
<p>从整体架构来看，除了常规的Embedding层，MIND模型还包含了两个重要的组件：多兴趣提取层和Label-Aware注意力层。</p>
<p><strong>多兴趣提取</strong></p>
<p>MIND模型的多兴趣提取技术源于对胶囊网络动态路由机制的创新性改进。胶囊网络
<span id="id3">(<a class="reference internal" href="../../chapter_references/references.html#id37" title="Sabour, S., Frosst, N., &amp; Hinton, G. E. (2017). Dynamic routing between capsules. Advances in neural information processing systems, 30.">Sabour <em>et al.</em>, 2017</a>)</span>
最初在计算机视觉领域被提出，其核心思想是用向量而非标量来表示特征，向量的方向编码属性信息，长度表示存在概率。动态路由则是确定不同层级胶囊之间连接强度的算法，它通过迭代优化的方式实现输入特征的软聚类。这种软聚类机制的优势在于，它不需要预先定义聚类数量或类别边界，而是让数据自然地分组，这正好契合了用户兴趣发现的需求。MIND模型引入了这一思想并提出了行为到兴趣（Behavior
to
Interest，B2I）动态路由：将用户的历史行为视为行为胶囊，将用户的多重兴趣视为兴趣胶囊，通过动态路由算法将相关的行为聚合到对应的兴趣维度中。MIND模型针对推荐系统的特点对原始动态路由算法进行了三个关键改进：</p>
<ol class="arabic">
<li><p><strong>共享变换矩阵</strong>。与原始胶囊网络为每对胶囊使用独立变换矩阵不同，MIND采用共享的双线性映射矩阵
<span class="math notranslate nohighlight">\(S \in \mathbb{R}^{d \times d}\)</span>。这种设计有两个重要考虑：首先，用户行为序列长度变化很大，从几十到几百不等，共享矩阵确保了算法的通用性
；其次，共享变换保证所有兴趣向量位于同一表示空间，便于后续的相似度计算和检索操作。路由连接强度的计算公式为：</p>
<div class="math notranslate nohighlight" id="equation-chapter-1-retrieval-3-sequence-1-user-interests-0">
<span class="eqno">(2.3.1)<a class="headerlink" href="#equation-chapter-1-retrieval-3-sequence-1-user-interests-0" title="Permalink to this equation">¶</a></span>\[b_{ij} = \boldsymbol{u}_j^T \boldsymbol{\textrm{S}} \boldsymbol{e}_i\]</div>
<p>其中 <span class="math notranslate nohighlight">\(\boldsymbol{e}_i\)</span> 表示用户历史行为 <span class="math notranslate nohighlight">\(i\)</span>
的物品向量，<span class="math notranslate nohighlight">\(\boldsymbol{u}_j\)</span> 表示第 <span class="math notranslate nohighlight">\(j\)</span>
个兴趣胶囊的向量，<span class="math notranslate nohighlight">\(b_{ij}\)</span> 衡量行为 <span class="math notranslate nohighlight">\(i\)</span> 与兴趣
<span class="math notranslate nohighlight">\(j\)</span> 的关联程度。</p>
</li>
<li><p><strong>随机初始化策略</strong>。为避免所有兴趣胶囊收敛到相同状态，算法采用高斯分布随机初始化路由系数
<span class="math notranslate nohighlight">\(b_{ij}\)</span>。这一策略类似于K-Means聚类中的随机中心初始化，确保不同兴趣胶囊能够捕捉用户兴趣的不同方面。</p></li>
<li><p><strong>自适应兴趣数量</strong>。考虑到不同用户的兴趣复杂度差异很大，MIND引入了动态兴趣数量机制：</p>
<div class="math notranslate nohighlight" id="equation-chapter-1-retrieval-3-sequence-1-user-interests-1">
<span class="eqno">(2.3.2)<a class="headerlink" href="#equation-chapter-1-retrieval-3-sequence-1-user-interests-1" title="Permalink to this equation">¶</a></span>\[K_u' = \max(1, \min(K, \log_2 (|\mathcal{I}_u|)))\]</div>
<p>其中 <span class="math notranslate nohighlight">\(|\mathcal{I}_u|\)</span> 表示用户 <span class="math notranslate nohighlight">\(u\)</span>
的历史行为数量，<span class="math notranslate nohighlight">\(K\)</span>
是预设的最大兴趣数。这种设计为行为较少的用户节省计算资源，同时为活跃用户提供更丰富的兴趣表示。</p>
</li>
</ol>
<p>改进后的动态路由过程通过迭代方式进行更新。<span class="math notranslate nohighlight">\(b_{ij}\)</span>在第一轮迭代时，初始化为0，在每轮迭代中更新路由系数和兴趣胶囊向量，直到收敛。
公式 （2.3.1）描述了路由系数 <span class="math notranslate nohighlight">\(b_{ij}\)</span> 的更新，但关键的兴趣胶囊向量
<span class="math notranslate nohighlight">\(\boldsymbol{u}_j\)</span>
是通过以下步骤计算的，这本质上是一个软聚类算法：</p>
<ol class="arabic">
<li><p><strong>计算路由权重 :</strong> 对于每一个历史行为（低层胶囊
<span class="math notranslate nohighlight">\(i\)</span>），其分配到各个兴趣（高层胶囊 <span class="math notranslate nohighlight">\(j\)</span>）的权重
<span class="math notranslate nohighlight">\(w_{ij}\)</span> 通过对路由系数 <span class="math notranslate nohighlight">\(b_{ij}\)</span> 进行Softmax操作得到。</p>
<div class="math notranslate nohighlight" id="equation-chapter-1-retrieval-3-sequence-1-user-interests-2">
<span class="eqno">(2.3.3)<a class="headerlink" href="#equation-chapter-1-retrieval-3-sequence-1-user-interests-2" title="Permalink to this equation">¶</a></span>\[w_{ij} = \frac{\exp{b_{ij}}}{\sum_{k=1}^{K_u'} \exp{b_{ik}}}\]</div>
<p>这里的 <span class="math notranslate nohighlight">\(w_{ij}\)</span> 可以理解为行为 <span class="math notranslate nohighlight">\(i\)</span> 属于兴趣 <span class="math notranslate nohighlight">\(j\)</span>
的“软分配”概率。</p>
</li>
<li><p><strong>聚合行为以形成兴趣向量:</strong> 每一个兴趣胶囊的初步向量
<span class="math notranslate nohighlight">\(\boldsymbol{z}_j\)</span> 是通过对所有行为向量
<span class="math notranslate nohighlight">\(\boldsymbol{e}_i\)</span>
进行加权求和得到的。每个行为向量在求和前会先经过共享变换矩阵
<span class="math notranslate nohighlight">\(\boldsymbol{S}\)</span> 的转换。</p>
<div class="math notranslate nohighlight" id="equation-chapter-1-retrieval-3-sequence-1-user-interests-3">
<span class="eqno">(2.3.4)<a class="headerlink" href="#equation-chapter-1-retrieval-3-sequence-1-user-interests-3" title="Permalink to this equation">¶</a></span>\[\boldsymbol{z}_j = \sum_{i\in \mathcal{I}_u} w_{ij} \boldsymbol{S} \boldsymbol{e}_i\]</div>
<p>这一步是聚类的核心：根据刚刚算出的权重，将相关的用户行为聚合起来，形成代表特定兴趣的向量。</p>
</li>
<li><p><strong>非线性压缩 :</strong> 为了将向量的模长（长度）约束在 [0, 1)
区间内，同时不改变其方向，模型使用了一个非线性的“squash”函数，从而得到本轮迭代的最终兴趣胶囊向量
<span class="math notranslate nohighlight">\(\boldsymbol{u}_j\)</span>。向量的长度可以被解释为该兴趣存在的概率，而其方向则编码了兴趣的具体属性。</p>
<div class="math notranslate nohighlight" id="equation-chapter-1-retrieval-3-sequence-1-user-interests-4">
<span class="eqno">(2.3.5)<a class="headerlink" href="#equation-chapter-1-retrieval-3-sequence-1-user-interests-4" title="Permalink to this equation">¶</a></span>\[\boldsymbol{u}_j = \text{squash}(\boldsymbol{z}_j) = \frac{\left\lVert \boldsymbol{z}_j \right\rVert ^ 2}{1 + \left\lVert \boldsymbol{z}_j \right\rVert ^ 2} \frac{\boldsymbol{z}_j}{\left\lVert \boldsymbol{z}_j \right\rVert}\]</div>
</li>
<li><p><strong>更新路由系数 (Updating Routing Logits):</strong>
最后，根据新生成的兴趣胶囊 <span class="math notranslate nohighlight">\(\boldsymbol{u}_j\)</span> 和行为向量
<span class="math notranslate nohighlight">\(\boldsymbol{e}_i\)</span>
之间的一致性（通过点积衡量），来更新下一轮迭代的路由系数
<span class="math notranslate nohighlight">\(b_{ij}\)</span>。</p>
<div class="math notranslate nohighlight" id="equation-chapter-1-retrieval-3-sequence-1-user-interests-5">
<span class="eqno">(2.3.6)<a class="headerlink" href="#equation-chapter-1-retrieval-3-sequence-1-user-interests-5" title="Permalink to this equation">¶</a></span>\[b_{ij} \leftarrow b_{ij} + \boldsymbol{u}_j^T \boldsymbol{S} \boldsymbol{e}_i\]</div>
</li>
</ol>
<p>以上四个步骤会重复进行固定的次数（通常为3次），最终输出收敛后的兴趣胶囊向量集合
<span class="math notranslate nohighlight">\(\{\boldsymbol{u}_j, j=1,...,K_{u}^\prime\}\)</span>
作为该用户的多兴趣表示。</p>
<p><strong>核心代码</strong></p>
<p>MIND的核心在于胶囊网络的动态路由实现。在每次迭代中，模型首先通过softmax计算路由权重，然后通过双线性变换聚合行为向量，最后使用squash函数进行非线性压缩：</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="c1"># 动态路由的核心循环</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">iteration_times</span><span class="p">):</span>  <span class="c1"># 通常迭代3次</span>
    <span class="c1"># 1. 计算路由权重 w_ij</span>
    <span class="n">routing_logits_with_padding</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">where</span><span class="p">(</span><span class="n">mask</span><span class="p">,</span> <span class="n">mask_routing_logits</span><span class="p">,</span> <span class="n">pad</span><span class="p">)</span>
    <span class="n">weight</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">nn</span><span class="o">.</span><span class="n">softmax</span><span class="p">(</span><span class="n">routing_logits_with_padding</span><span class="p">)</span>  <span class="c1"># [B, k_max, max_len]</span>

    <span class="c1"># 2. 通过共享的双线性映射矩阵 S 变换行为嵌入</span>
    <span class="n">behavior_embdding_mapping</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">tensordot</span><span class="p">(</span>
        <span class="n">behavior_embddings</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">bilinear_mapping_matrix</span><span class="p">,</span> <span class="n">axes</span><span class="o">=</span><span class="mi">1</span>
    <span class="p">)</span>  <span class="c1"># [B, max_len, out_units]</span>

    <span class="c1"># 3. 加权聚合形成兴趣胶囊</span>
    <span class="n">Z</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">matmul</span><span class="p">(</span><span class="n">weight</span><span class="p">,</span> <span class="n">behavior_embdding_mapping</span><span class="p">)</span>  <span class="c1"># [B, k_max, out_units]</span>
    <span class="n">interest_capsules</span> <span class="o">=</span> <span class="n">squash</span><span class="p">(</span><span class="n">Z</span><span class="p">)</span>  <span class="c1"># 非线性压缩到 [0, 1)</span>

    <span class="c1"># 4. 更新路由系数：基于兴趣胶囊与行为的一致性</span>
    <span class="n">delta_routing_logits</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">reduce_sum</span><span class="p">(</span>
        <span class="n">tf</span><span class="o">.</span><span class="n">matmul</span><span class="p">(</span><span class="n">interest_capsules</span><span class="p">,</span> <span class="n">tf</span><span class="o">.</span><span class="n">transpose</span><span class="p">(</span><span class="n">behavior_embdding_mapping</span><span class="p">,</span> <span class="n">perm</span><span class="o">=</span><span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">1</span><span class="p">])),</span>
        <span class="n">axis</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">keepdims</span><span class="o">=</span><span class="kc">True</span>
    <span class="p">)</span>
    <span class="bp">self</span><span class="o">.</span><span class="n">routing_logits</span><span class="o">.</span><span class="n">assign_add</span><span class="p">(</span><span class="n">delta_routing_logits</span><span class="p">)</span>
</pre></div>
</div>
<p>这里的squash函数实现了向量长度的非线性压缩，确保输出向量的模长在<span class="math notranslate nohighlight">\([0, 1)\)</span>区间内：</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="k">def</span><span class="w"> </span><span class="nf">squash</span><span class="p">(</span><span class="n">inputs</span><span class="p">):</span>
<span class="w">    </span><span class="sd">&quot;&quot;&quot;非线性压缩函数，将向量长度映射到 [0, 1) 区间&quot;&quot;&quot;</span>
    <span class="n">vec_squared_norm</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">reduce_sum</span><span class="p">(</span><span class="n">tf</span><span class="o">.</span><span class="n">square</span><span class="p">(</span><span class="n">inputs</span><span class="p">),</span> <span class="n">axis</span><span class="o">=-</span><span class="mi">1</span><span class="p">,</span> <span class="n">keepdims</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
    <span class="n">scalar_factor</span> <span class="o">=</span> <span class="n">vec_squared_norm</span> <span class="o">/</span> <span class="p">(</span><span class="mi">1</span> <span class="o">+</span> <span class="n">vec_squared_norm</span><span class="p">)</span> <span class="o">/</span> <span class="n">tf</span><span class="o">.</span><span class="n">sqrt</span><span class="p">(</span><span class="n">vec_squared_norm</span> <span class="o">+</span> <span class="mf">1e-9</span><span class="p">)</span>
    <span class="k">return</span> <span class="n">scalar_factor</span> <span class="o">*</span> <span class="n">inputs</span>
</pre></div>
</div>
<p><strong>标签感知的注意力机制</strong></p>
<p>多兴趣提取层生成了用户的多个兴趣向量，但在训练阶段，我们需要确定哪个兴趣向量与当前的目标商品最相关。因为在训练时，我们拥有‘正确答案’（即用户实际点击的下一个商品），所以可以利用这个‘标签’信息，来反向监督模型，让模型学会在多个兴趣向量中，挑出与正确答案最相关的那一个。这相当于在训练阶段给模型一个明确的指引。MIND模型设计了标签感知注意力层来解决这个问题。</p>
<p>该注意力机制以目标商品向量作为查询，以用户的多个兴趣向量作为键和值。具体计算过程如下：</p>
<div class="math notranslate nohighlight" id="equation-chapter-1-retrieval-3-sequence-1-user-interests-6">
<span class="eqno">(2.3.7)<a class="headerlink" href="#equation-chapter-1-retrieval-3-sequence-1-user-interests-6" title="Permalink to this equation">¶</a></span>\[v_u = V_u \cdot \text{Softmax}(\text{pow}(V_u^T e_i, p))\]</div>
<p>其中<span class="math notranslate nohighlight">\(V_u = (v_u^1, \ldots, v_u^K)\)</span>表示用户的兴趣胶囊矩阵，通过将兴趣胶囊向量<span class="math notranslate nohighlight">\(\boldsymbol{u}\)</span>与用户画像Embedding进行拼接，再经过多层ReLU变换得到
（见 <a class="reference internal" href="#mind-model-architecture"><span class="std std-numref">图2.3.1</span></a>
）。<span class="math notranslate nohighlight">\(e_i\)</span>是目标商品<span class="math notranslate nohighlight">\(i\)</span>的Embedding向量，<span class="math notranslate nohighlight">\(p\)</span>是控制注意力集中度的超参数。</p>
<p>参数<span class="math notranslate nohighlight">\(p\)</span>控制注意力分布：当<span class="math notranslate nohighlight">\(p\)</span>接近0时，所有兴趣向量获得均等关注；随着<span class="math notranslate nohighlight">\(p\)</span>增大，注意力逐渐集中于与目标商品最相似的兴趣向量；当<span class="math notranslate nohighlight">\(p\)</span>趋于无穷时，机制退化为硬注意力，只选择相似度最高的兴趣向量。实验表明，使用较大的<span class="math notranslate nohighlight">\(p\)</span>值能够加快模型收敛速度。</p>
<p>通过标签感知得到用户向量<span class="math notranslate nohighlight">\(v_u\)</span>后，MIND模型的训练目标就是让用户向量与其真实交互的商品尽可能“匹配”。具体来说，模型会最大化用户与正样本商品的相似度，同时最小化与负样本的相似度。由于商品库通常非常庞大，直接计算所有商品的概率分布在计算上不现实，因此MIND采用了和YouTubeDNN相同的策略——使用Sampled
Softmax损失函数，通过随机采样一小部分负样本来近似全局的归一化计算。</p>
<p>标签感知注意力的实现比较直观，核心是使用目标商品向量作为查询，计算与各个兴趣向量的相似度：</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="k">def</span><span class="w"> </span><span class="nf">call</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">inputs</span><span class="p">,</span> <span class="n">training</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
    <span class="n">keys</span> <span class="o">=</span> <span class="n">inputs</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>  <span class="c1"># 多个兴趣胶囊向量 [batch_size, k_max, dim]</span>
    <span class="n">query</span> <span class="o">=</span> <span class="n">inputs</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>  <span class="c1"># 目标商品向量 [batch_size, dim]</span>

    <span class="c1"># 计算每个兴趣向量与目标商品的相似度</span>
    <span class="n">weight</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">reduce_sum</span><span class="p">(</span><span class="n">keys</span> <span class="o">*</span> <span class="n">query</span><span class="p">,</span> <span class="n">axis</span><span class="o">=-</span><span class="mi">1</span><span class="p">,</span> <span class="n">keepdims</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>  <span class="c1"># [batch_size, k_max, 1]</span>

    <span class="c1"># 通过幂次运算控制注意力集中度</span>
    <span class="n">weight</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">pow</span><span class="p">(</span><span class="n">weight</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">pow_p</span><span class="p">)</span>

    <span class="c1"># 如果 pow_p 很大（&gt;= 100），直接选择最相似的兴趣</span>
    <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">pow_p</span> <span class="o">&gt;=</span> <span class="mi">100</span><span class="p">:</span>
        <span class="n">idx</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">argmax</span><span class="p">(</span><span class="n">weight</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">output_type</span><span class="o">=</span><span class="n">tf</span><span class="o">.</span><span class="n">int32</span><span class="p">)</span>
        <span class="n">output</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">gather_nd</span><span class="p">(</span><span class="n">keys</span><span class="p">,</span> <span class="n">idx</span><span class="p">)</span>
    <span class="k">else</span><span class="p">:</span>
        <span class="c1"># 否则使用 softmax 进行加权聚合</span>
        <span class="n">weight</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">nn</span><span class="o">.</span><span class="n">softmax</span><span class="p">(</span><span class="n">weight</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
        <span class="n">output</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">reduce_sum</span><span class="p">(</span><span class="n">keys</span> <span class="o">*</span> <span class="n">weight</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>  <span class="c1"># [batch_size, dim]</span>

    <span class="k">return</span> <span class="n">output</span>
</pre></div>
</div>
<p><strong>训练和评估</strong></p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span><span class="w"> </span><span class="nn">funrec</span><span class="w"> </span><span class="kn">import</span> <span class="n">run_experiment</span>

<span class="n">run_experiment</span><span class="p">(</span><span class="s1">&#39;mind&#39;</span><span class="p">)</span>
</pre></div>
</div>
<div class="output highlight-default notranslate"><div class="highlight"><pre><span></span><span class="o">+---------------+--------------+-----------+----------+----------------+---------------+</span>
<span class="o">|</span>   <span class="n">hit_rate</span><span class="o">@</span><span class="mi">10</span> <span class="o">|</span>   <span class="n">hit_rate</span><span class="o">@</span><span class="mi">5</span> <span class="o">|</span>   <span class="n">ndcg</span><span class="o">@</span><span class="mi">10</span> <span class="o">|</span>   <span class="n">ndcg</span><span class="o">@</span><span class="mi">5</span> <span class="o">|</span>   <span class="n">precision</span><span class="o">@</span><span class="mi">10</span> <span class="o">|</span>   <span class="n">precision</span><span class="o">@</span><span class="mi">5</span> <span class="o">|</span>
<span class="o">+===============+==============+===========+==========+================+===============+</span>
<span class="o">|</span>        <span class="mf">0.0036</span> <span class="o">|</span>       <span class="mf">0.0008</span> <span class="o">|</span>    <span class="mf">0.0012</span> <span class="o">|</span>   <span class="mf">0.0003</span> <span class="o">|</span>         <span class="mf">0.0004</span> <span class="o">|</span>        <span class="mf">0.0002</span> <span class="o">|</span>
<span class="o">+---------------+--------------+-----------+----------+----------------+---------------+</span>
</pre></div>
</div>
</section>
<section id="sdm">
<h2><span class="section-number">2.3.1.2. </span>SDM：融合长短期兴趣，捕捉动态变化<a class="headerlink" href="#sdm" title="Permalink to this heading">¶</a></h2>
<p>MIND解决了兴趣“广度”的问题，但新的问题随之而来：时间。
用户兴趣不仅是多元的，还是动态演化的。一个用户在一次购物会话（Session）中的行为，往往比他一个月前的行为更能预示他下一刻的需求。MIND虽然能捕捉多个兴趣，但并未在结构上显式地区分它们的时效性。序列深度匹配模型（SDM）
<span id="id4">(<a class="reference internal" href="../../chapter_references/references.html#id38" title="Lv, F., Jin, T., Yu, C., Sun, F., Lin, Q., Yang, K., &amp; Ng, W. (2019). Sdm: sequential deep matching model for online large-scale recommender system. Proceedings of the 28th ACM international conference on information and knowledge management (pp. 2635–2643).">Lv <em>et al.</em>, 2019</a>)</span> 正是为了解决这一问题而提出的。SDM模型
的核心思想是分别建模用户的短期即时兴趣和长期稳定偏好，然后智能地融合它们。</p>
<figure class="align-default" id="id6">
<span id="sdm-model-architecture"></span><a class="reference internal image-reference" href="../../_images/sdm_model_architecture.png"><img alt="../../_images/sdm_model_architecture.png" src="../../_images/sdm_model_architecture.png" style="width: 600px;" /></a>
<figcaption>
<p><span class="caption-number">图2.3.2 </span><span class="caption-text">SDM模型结构</span><a class="headerlink" href="#id6" title="Permalink to this image">¶</a></p>
</figcaption>
</figure>
<p><strong>捕捉短期兴趣</strong></p>
<p>为了精准捕捉短期兴趣，SDM设计了一个三层结构来处理用户的当前会话序列(<a class="reference internal" href="#sdm-model-architecture"><span class="std std-numref">图2.3.2</span></a>
左下角) 。</p>
<p>首先，将短期会话中的商品序列输入LSTM网络，学习序列间的时序依赖关系。LSTM的标准计算过程为：</p>
<div class="math notranslate nohighlight" id="equation-chapter-1-retrieval-3-sequence-1-user-interests-7">
<span class="eqno">(2.3.8)<a class="headerlink" href="#equation-chapter-1-retrieval-3-sequence-1-user-interests-7" title="Permalink to this equation">¶</a></span>\[\begin{split}\begin{aligned}
\boldsymbol{i} \boldsymbol{n}_{t}^{u} &amp;=\sigma\left(\boldsymbol{W}_{i n}^{1} \boldsymbol{e}_{i_{t}^{u}}+\boldsymbol{W}_{i n}^{2} \boldsymbol{h}_{t-1}^{u}+b_{i n}\right) \\
\boldsymbol{f}_{t}^{u} &amp;=\sigma\left(\boldsymbol{W}_{f}^{1} \boldsymbol{e}_{i_{t}^{u}}+\boldsymbol{W}_{f}^{2} \boldsymbol{h}_{t-1}^{u}+b_{f}\right) \\
\boldsymbol{o}_{t}^{u} &amp;=\sigma\left(\boldsymbol{W}_{o}^{1} \boldsymbol{e}_{i}^{u}+\boldsymbol{W}_{o}^{2} \boldsymbol{h}_{t-1}^{u}+b_{o}\right) \\
\boldsymbol{c}_{t}^{u} &amp;=\boldsymbol{f}_{t}^{u} \boldsymbol{c}_{t-1}^{u}+\boldsymbol{i} \boldsymbol{n}_{t}^{u} \tanh \left(\boldsymbol{W}_{c}^{1} \boldsymbol{e}_{i_{t}^{u}}+\boldsymbol{W}_{c}^{2} \boldsymbol{h}_{t-1}^{u}+b_{c}\right) \\
\boldsymbol{h}_{t}^{u} &amp;=\boldsymbol{o}_{t}^{u} \tanh \left(\boldsymbol{c}_{t}^{u}\right)
\end{aligned}\end{split}\]</div>
<p>这里<span class="math notranslate nohighlight">\(\boldsymbol{e}_{i_{t}^{u}}\)</span>表示第<span class="math notranslate nohighlight">\(t\)</span>个时间步的商品Embedding，<span class="math notranslate nohighlight">\(\sigma\)</span>表示sigmoid激活函数，<span class="math notranslate nohighlight">\(\boldsymbol{W}\)</span>表示权重矩阵，<span class="math notranslate nohighlight">\(b\)</span>表示偏置向量。LSTM采用多输入多输出模式，每个时间步都输出隐藏状态<span class="math notranslate nohighlight">\(\boldsymbol{h}_{t}^{u} \in \mathbb{R}^{d \times 1}\)</span>，最终得到序列表示<span class="math notranslate nohighlight">\(\boldsymbol{X}^{u} = [\boldsymbol{h}_{1}^{u}, \ldots, \boldsymbol{h}_{t}^{u}]\)</span>。</p>
<p>LSTM的引入主要是为了处理在线购物中的一个常见问题：用户往往会产生一些随机点击，这些不相关的行为会干扰序列表示。通过LSTM的门控机制，模型能够更好地捕捉序列中的有效信息。</p>
<p>然后，SDM采用多头自注意力机制来捕捉用户兴趣的多样性。多头自注意力机制。</p>
<div class="math notranslate nohighlight" id="equation-chapter-1-retrieval-3-sequence-1-user-interests-8">
<span class="eqno">(2.3.9)<a class="headerlink" href="#equation-chapter-1-retrieval-3-sequence-1-user-interests-8" title="Permalink to this equation">¶</a></span>\[\text{head}_{i}^{u}=\operatorname{Attention}\left(\boldsymbol{W}_{i}^{Q} \boldsymbol{X}^{u}, \boldsymbol{W}_{i}^{K} \boldsymbol{X}^{u}, \boldsymbol{W}_{i}^{V} \boldsymbol{X}^{u}\right)\]</div>
<p>具体计算过程为：</p>
<div class="math notranslate nohighlight" id="equation-chapter-1-retrieval-3-sequence-1-user-interests-9">
<span class="eqno">(2.3.10)<a class="headerlink" href="#equation-chapter-1-retrieval-3-sequence-1-user-interests-9" title="Permalink to this equation">¶</a></span>\[\begin{split}\begin{aligned}
&amp;f\left(Q_{i}^{u}, K_{i}^{u}\right)=Q_{i}^{u T} K_{i}^{u} \\
&amp;A_{i}^{u}=\operatorname{softmax}\left(f\left(Q_{i}^{u}, K_{i}^{u}\right)\right) \\
&amp;\operatorname{head}_{i}^{u}=V_{i}^{u} A_{i}^{u T}
\end{aligned}\end{split}\]</div>
<p>其中<span class="math notranslate nohighlight">\(Q_{i}^{u}\)</span>、<span class="math notranslate nohighlight">\(K_{i}^{u}\)</span>、<span class="math notranslate nohighlight">\(V_{i}^{u}\)</span>分别表示第<span class="math notranslate nohighlight">\(i\)</span>个头的查询、键、值矩阵，<span class="math notranslate nohighlight">\(\boldsymbol{W}_{i}^{Q}\)</span>、<span class="math notranslate nohighlight">\(\boldsymbol{W}_{i}^{K}\)</span>、<span class="math notranslate nohighlight">\(\boldsymbol{W}_{i}^{V}\)</span>是对应的权重矩阵。</p>
<p>多头注意力的最终输出为：</p>
<div class="math notranslate nohighlight" id="equation-chapter-1-retrieval-3-sequence-1-user-interests-10">
<span class="eqno">(2.3.11)<a class="headerlink" href="#equation-chapter-1-retrieval-3-sequence-1-user-interests-10" title="Permalink to this equation">¶</a></span>\[\hat{X}^{u}=\text{MultiHead}\left(X^{u}\right)=W^{O} \text{concat}\left(\text{head}_{1}^{u}, \ldots, \text{head}_{h}^{u}\right)\]</div>
<p>其中<span class="math notranslate nohighlight">\(h\)</span>是头的数量，<span class="math notranslate nohighlight">\(W^{O}\)</span>是输出权重矩阵。每个头可以专注于不同的兴趣维度，通过多头机制实现对用户多重兴趣的并行建模。</p>
<p>最后，SDM加入个性化注意力层，使用用户画像向量<span class="math notranslate nohighlight">\(\boldsymbol{e}_u\)</span>作为查询，对多头注意力输出进行加权：</p>
<div class="math notranslate nohighlight" id="equation-chapter-1-retrieval-3-sequence-1-user-interests-11">
<span class="eqno">(2.3.12)<a class="headerlink" href="#equation-chapter-1-retrieval-3-sequence-1-user-interests-11" title="Permalink to this equation">¶</a></span>\[\begin{split}\begin{aligned}
\alpha_{k} &amp;=\frac{\exp\left(\hat{\boldsymbol{h}}_{k}^{u T} \boldsymbol{e}_{u}\right)}{\sum_{k=1}^{t} \exp\left(\hat{\boldsymbol{h}}_{k}^{u T} \boldsymbol{e}_{u}\right)} \\
\boldsymbol{s}_{t}^{u} &amp;=\sum_{k=1}^{t} \alpha_{k} \hat{\boldsymbol{h}}_{k}^{u}
\end{aligned}\end{split}\]</div>
<p>这里<span class="math notranslate nohighlight">\(\hat{\boldsymbol{h}}_{k}^{u}\)</span>是多头注意力输出<span class="math notranslate nohighlight">\(\hat{X}^{u}\)</span>中第<span class="math notranslate nohighlight">\(k\)</span>个位置的隐藏状态，<span class="math notranslate nohighlight">\(\alpha_{k}\)</span>是对应的注意力权重。最终得到融合个性化信息的短期兴趣表示<span class="math notranslate nohighlight">\(\boldsymbol{s}_{t}^{u} \in \mathbb{R}^{d \times 1}\)</span>。</p>
<p><strong>核心代码</strong></p>
<p>SDM的短期兴趣建模采用了三层架构，逐步从原始序列中提取用户的即时兴趣：</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="c1"># 1. 序列信息学习层：使用LSTM处理序列依赖</span>
<span class="n">lstm_layer</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">keras</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">LSTM</span><span class="p">(</span>
    <span class="n">emb_dim</span><span class="p">,</span>
    <span class="n">return_sequences</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span>  <span class="c1"># 返回所有时间步的输出</span>
    <span class="n">recurrent_initializer</span><span class="o">=</span><span class="s1">&#39;glorot_uniform&#39;</span>
<span class="p">)</span>
<span class="n">sequence_output</span> <span class="o">=</span> <span class="n">lstm_layer</span><span class="p">(</span><span class="n">short_history_item_emb</span><span class="p">)</span>  <span class="c1"># [batch_size, seq_len, dim]</span>

<span class="c1"># 2. 多兴趣提取层：多头自注意力捕捉序列内的复杂关系</span>
<span class="n">norm_sequence_output</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">keras</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">LayerNormalization</span><span class="p">()(</span><span class="n">sequence_output</span><span class="p">)</span>
<span class="n">sequence_output</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">keras</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">MultiHeadAttention</span><span class="p">(</span>
    <span class="n">num_heads</span><span class="o">=</span><span class="n">num_heads</span><span class="p">,</span>
    <span class="n">key_dim</span><span class="o">=</span><span class="n">emb_dim</span> <span class="o">//</span> <span class="n">num_heads</span><span class="p">,</span>
    <span class="n">dropout</span><span class="o">=</span><span class="mf">0.1</span>
<span class="p">)(</span><span class="n">norm_sequence_output</span><span class="p">,</span> <span class="n">sequence_output</span><span class="p">)</span>  <span class="c1"># [batch_size, seq_len, dim]</span>

<span class="n">short_term_output</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">keras</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">LayerNormalization</span><span class="p">()(</span><span class="n">sequence_output</span><span class="p">)</span>

<span class="c1"># 3. 用户个性化注意力层：使用用户画像作为查询向量</span>
<span class="n">user_attention</span> <span class="o">=</span> <span class="n">UserAttention</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s1">&#39;user_attention_short&#39;</span><span class="p">)</span>
<span class="n">short_term_interest</span> <span class="o">=</span> <span class="n">user_attention</span><span class="p">(</span>
    <span class="n">user_embedding</span><span class="p">,</span>  <span class="c1"># [batch_size, 1, dim] 用户画像作为查询</span>
    <span class="n">short_term_output</span>  <span class="c1"># [batch_size, seq_len, dim] 序列作为键和值</span>
<span class="p">)</span>  <span class="c1"># [batch_size, 1, dim]</span>
</pre></div>
</div>
<p>个性化注意力层的实现通过用户画像与序列特征的点积来计算注意力权重：</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="k">class</span><span class="w"> </span><span class="nc">UserAttention</span><span class="p">(</span><span class="n">tf</span><span class="o">.</span><span class="n">keras</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">Layer</span><span class="p">):</span>
<span class="w">    </span><span class="sd">&quot;&quot;&quot;用户注意力层，使用用户基础表示作为查询向量&quot;&quot;&quot;</span>

    <span class="k">def</span><span class="w"> </span><span class="nf">call</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">query_vector</span><span class="p">,</span> <span class="n">key_vectors</span><span class="p">):</span>
        <span class="c1"># 计算注意力分数：query · key^T</span>
        <span class="n">attention_scores</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">matmul</span><span class="p">(</span>
            <span class="n">query_vector</span><span class="p">,</span>  <span class="c1"># [batch_size, 1, dim]</span>
            <span class="n">tf</span><span class="o">.</span><span class="n">transpose</span><span class="p">(</span><span class="n">key_vectors</span><span class="p">,</span> <span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">1</span><span class="p">])</span>  <span class="c1"># [batch_size, dim, seq_len]</span>
        <span class="p">)</span>  <span class="c1"># [batch_size, 1, seq_len]</span>

        <span class="n">attention_scores</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">squeeze</span><span class="p">(</span><span class="n">attention_scores</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
        <span class="n">attention_weights</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">nn</span><span class="o">.</span><span class="n">softmax</span><span class="p">(</span><span class="n">attention_scores</span><span class="p">,</span> <span class="n">axis</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span>

        <span class="c1"># 加权求和得到上下文向量</span>
        <span class="n">context_vector</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">matmul</span><span class="p">(</span>
            <span class="n">tf</span><span class="o">.</span><span class="n">expand_dims</span><span class="p">(</span><span class="n">attention_weights</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">),</span>
            <span class="n">key_vectors</span>
        <span class="p">)</span>  <span class="c1"># [batch_size, 1, dim]</span>

        <span class="k">return</span> <span class="n">context_vector</span>
</pre></div>
</div>
<p><strong>捕捉长期兴趣</strong></p>
<p>长期行为包含丰富的用户偏好信息，但与短期行为的建模方式不同。SDM从特征维度对长期行为进行聚合，将历史行为按不同特征分成多个子集
(<a class="reference internal" href="#sdm-model-architecture"><span class="std std-numref">图2.3.2</span></a> 左上角)：</p>
<div class="math notranslate nohighlight" id="equation-chapter-1-retrieval-3-sequence-1-user-interests-12">
<span class="eqno">(2.3.13)<a class="headerlink" href="#equation-chapter-1-retrieval-3-sequence-1-user-interests-12" title="Permalink to this equation">¶</a></span>\[\mathcal{L}^{u}=\left\{\mathcal{L}_{f}^{u} \mid f \in \mathcal{F}\right\}\]</div>
<p>具体包括：交互过的商品ID集合
<span class="math notranslate nohighlight">\(\mathcal{L}^{u}_{id}\)</span>，叶子类别集合
<span class="math notranslate nohighlight">\(\mathcal{L}^{u}_{leaf}\)</span>，一级类别集合
<span class="math notranslate nohighlight">\(\mathcal{L}^{u}_{cate}\)</span>，访问过的商店集合
<span class="math notranslate nohighlight">\(\mathcal{L}^{u}_{shop}\)</span>，交互过的品牌集合
<span class="math notranslate nohighlight">\(\mathcal{L}^{u}_{brand}\)</span>。这种特征维度的分离使模型能够从不同角度理解用户的长期偏好模式。</p>
<p>对每个特征子集，模型使用注意力机制计算用户在该维度上的偏好。将特征实体<span class="math notranslate nohighlight">\(f^{u}_{k} \in \mathcal{L}^{u}_{f}\)</span>通过嵌入矩阵转换为向量<span class="math notranslate nohighlight">\(\boldsymbol{g}^{u}_{k}\)</span>，然后使用用户画像<span class="math notranslate nohighlight">\(\boldsymbol{e}_u\)</span>计算注意力权重：</p>
<div class="math notranslate nohighlight" id="equation-chapter-1-retrieval-3-sequence-1-user-interests-13">
<span class="eqno">(2.3.14)<a class="headerlink" href="#equation-chapter-1-retrieval-3-sequence-1-user-interests-13" title="Permalink to this equation">¶</a></span>\[\begin{split}\begin{aligned}
\alpha_{k} &amp;=\frac{\exp \left(\boldsymbol{g}_{k}^{u T} \boldsymbol{e}_{u}\right)}{\sum_{k=1}^{\left|\mathcal{L}_{f}^{u}\right|} \exp \left(\boldsymbol{g}_{k}^{u T} \boldsymbol{e}_{u}\right)} \\
\boldsymbol{z}_{f}^{u} &amp;=\sum_{k=1}^{\left|\mathcal{L}_{f}^{u}\right|} \alpha_{k} \boldsymbol{g}_{k}^{u}
\end{aligned}\end{split}\]</div>
<p>其中<span class="math notranslate nohighlight">\(\left|\mathcal{L}_{f}^{u}\right|\)</span>表示特征子集的大小。</p>
<p>最终将各特征维度的表示拼接，通过全连接网络得到长期兴趣表示：</p>
<div class="math notranslate nohighlight" id="equation-chapter-1-retrieval-3-sequence-1-user-interests-14">
<span class="eqno">(2.3.15)<a class="headerlink" href="#equation-chapter-1-retrieval-3-sequence-1-user-interests-14" title="Permalink to this equation">¶</a></span>\[\begin{split}\begin{aligned}
\boldsymbol{z}^{u} &amp;=\operatorname{concat}\left(\left\{\boldsymbol{z}_{f}^{u} \mid f \in \mathcal{F}\right\}\right) \\
\boldsymbol{p}^{u} &amp;=\tanh \left(\boldsymbol{W}^{p} \boldsymbol{z}^{u}+\boldsymbol{b}\right)
\end{aligned}\end{split}\]</div>
<p>其中<span class="math notranslate nohighlight">\(\boldsymbol{W}^{p}\)</span>是权重矩阵，<span class="math notranslate nohighlight">\(\boldsymbol{b}\)</span>是偏置向量。</p>
<p><strong>核心代码</strong></p>
<p>长期兴趣的建模采用特征维度聚合的方式，对每个特征维度分别应用注意力机制：</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="c1"># 从不同特征维度对长期行为进行聚合</span>
<span class="n">long_history_features</span> <span class="o">=</span> <span class="n">group_embedding_feature_dict</span><span class="p">[</span><span class="s1">&#39;raw_hist_seq_long&#39;</span><span class="p">]</span>

<span class="n">long_term_interests</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">name</span><span class="p">,</span> <span class="n">long_history_feature</span> <span class="ow">in</span> <span class="n">long_history_features</span><span class="o">.</span><span class="n">items</span><span class="p">():</span>
    <span class="c1"># 为每个特征维度生成 mask</span>
    <span class="n">long_history_mask</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">keras</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">Lambda</span><span class="p">(</span>
        <span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">tf</span><span class="o">.</span><span class="n">expand_dims</span><span class="p">(</span>
            <span class="n">tf</span><span class="o">.</span><span class="n">cast</span><span class="p">(</span><span class="n">tf</span><span class="o">.</span><span class="n">not_equal</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="mi">0</span><span class="p">),</span> <span class="n">dtype</span><span class="o">=</span><span class="n">tf</span><span class="o">.</span><span class="n">float32</span><span class="p">),</span> <span class="n">axis</span><span class="o">=-</span><span class="mi">1</span>
        <span class="p">)</span>
    <span class="p">)(</span><span class="n">input_layer_dict</span><span class="p">[</span><span class="n">name</span><span class="p">])</span>  <span class="c1"># [batch_size, max_len_long, 1]</span>

    <span class="c1"># 应用 mask 到特征嵌入</span>
    <span class="n">long_history_item_emb</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">keras</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">Lambda</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">x</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">*</span> <span class="n">x</span><span class="p">[</span><span class="mi">1</span><span class="p">])(</span>
        <span class="p">[</span><span class="n">long_history_feature</span><span class="p">,</span> <span class="n">long_history_mask</span><span class="p">]</span>
    <span class="p">)</span>  <span class="c1"># [batch_size, max_len_long, dim]</span>

    <span class="c1"># 对每个特征维度应用用户注意力</span>
    <span class="n">user_attention</span> <span class="o">=</span> <span class="n">UserAttention</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="sa">f</span><span class="s1">&#39;user_attention_long_</span><span class="si">{</span><span class="n">name</span><span class="si">}</span><span class="s1">&#39;</span><span class="p">)</span>
    <span class="n">long_term_interests</span><span class="o">.</span><span class="n">append</span><span class="p">(</span>
        <span class="n">user_attention</span><span class="p">(</span><span class="n">user_embedding</span><span class="p">,</span> <span class="n">long_history_item_emb</span><span class="p">)</span>
    <span class="p">)</span>  <span class="c1"># [batch_size, 1, dim]</span>

<span class="c1"># 拼接所有特征维度的表示</span>
<span class="n">long_term_interests_concat</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">keras</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">Concatenate</span><span class="p">(</span><span class="n">axis</span><span class="o">=-</span><span class="mi">1</span><span class="p">)(</span>
    <span class="n">long_term_interests</span>
<span class="p">)</span>  <span class="c1"># [batch_size, 1, dim * len(long_history_features)]</span>

<span class="c1"># 通过全连接层融合</span>
<span class="n">long_term_interest</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">keras</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">Dense</span><span class="p">(</span><span class="n">emb_dim</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="s1">&#39;tanh&#39;</span><span class="p">)(</span>
    <span class="n">long_term_interests_concat</span>
<span class="p">)</span>  <span class="c1"># [batch_size, 1, dim]</span>
</pre></div>
</div>
<p><strong>长短期兴趣融合</strong></p>
<p>有了长短期兴趣表示后，关键问题是如何有效融合这两部分信息。用户的长期行为虽然丰富，但通常只有一小部分与当前决策相关。简单的拼接或加权求和难以准确提取相关信息。</p>
<p>SDM设计了门控融合机制，类似LSTM中的门控思想
(<a class="reference internal" href="#sdm-model-architecture"><span class="std std-numref">图2.3.2</span></a> 中间部分)：</p>
<div class="math notranslate nohighlight" id="equation-chapter-1-retrieval-3-sequence-1-user-interests-15">
<span class="eqno">(2.3.16)<a class="headerlink" href="#equation-chapter-1-retrieval-3-sequence-1-user-interests-15" title="Permalink to this equation">¶</a></span>\[\begin{split}\begin{aligned}
\boldsymbol{G}_{t}^{u} &amp;= \operatorname{sigmoid}\left(\boldsymbol{W}^{1} \boldsymbol{e}_{u}+\boldsymbol{W}^{2} \boldsymbol{s}_{t}^{u}+\boldsymbol{W}^{3} \boldsymbol{p}^{u}+\boldsymbol{b}\right) \\
\boldsymbol{o}_{t}^{u} &amp;= \left(1-\boldsymbol{G}_{t}^{u}\right) \odot \boldsymbol{p}^{u}+\boldsymbol{G}_{t}^{u} \odot \boldsymbol{s}_{t}^{u}
\end{aligned}\end{split}\]</div>
<p>这里<span class="math notranslate nohighlight">\(\boldsymbol{G}_{t}^{u} \in \mathbb{R}^{d \times 1}\)</span>是门控向量，<span class="math notranslate nohighlight">\(\odot\)</span>表示逐元素乘法，<span class="math notranslate nohighlight">\(\boldsymbol{W}^{1}\)</span>、<span class="math notranslate nohighlight">\(\boldsymbol{W}^{2}\)</span>、<span class="math notranslate nohighlight">\(\boldsymbol{W}^{3}\)</span>是权重矩阵。</p>
<p>门控网络接收三个输入：用户画像<span class="math notranslate nohighlight">\(\boldsymbol{e}_{u}\)</span>、短期兴趣<span class="math notranslate nohighlight">\(\boldsymbol{s}_{t}^{u}\)</span>和长期兴趣<span class="math notranslate nohighlight">\(\boldsymbol{p}^{u}\)</span>，输出的门控向量每个元素值介于0到1之间，决定了对应维度上长短期兴趣的贡献比例。这让模型能够在不同兴趣维度上分别选择保留长期或短期信息，避免简单平均可能带来的信息损失，使模型能够精确捕捉长期行为中与当前兴趣最相关的部分。</p>
<p><strong>核心代码</strong></p>
<p>门控融合机制通过学习三个权重矩阵来决定如何融合长短期兴趣：</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="k">class</span><span class="w"> </span><span class="nc">GatedFusion</span><span class="p">(</span><span class="n">tf</span><span class="o">.</span><span class="n">keras</span><span class="o">.</span><span class="n">layers</span><span class="o">.</span><span class="n">Layer</span><span class="p">):</span>
<span class="w">    </span><span class="sd">&quot;&quot;&quot;门控融合层，用于融合长期和短期兴趣&quot;&quot;&quot;</span>

    <span class="k">def</span><span class="w"> </span><span class="nf">build</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">input_shape</span><span class="p">):</span>
        <span class="n">dim</span> <span class="o">=</span> <span class="n">input_shape</span><span class="p">[</span><span class="mi">0</span><span class="p">][</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
        <span class="c1"># 为用户画像、短期兴趣、长期兴趣分别学习权重矩阵</span>
        <span class="bp">self</span><span class="o">.</span><span class="n">W1</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">add_weight</span><span class="p">(</span>
            <span class="n">shape</span><span class="o">=</span><span class="p">(</span><span class="n">dim</span><span class="p">,</span> <span class="n">dim</span><span class="p">),</span> <span class="n">initializer</span><span class="o">=</span><span class="s1">&#39;glorot_uniform&#39;</span><span class="p">,</span> <span class="n">trainable</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s1">&#39;W1&#39;</span>
        <span class="p">)</span>
        <span class="bp">self</span><span class="o">.</span><span class="n">W2</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">add_weight</span><span class="p">(</span>
            <span class="n">shape</span><span class="o">=</span><span class="p">(</span><span class="n">dim</span><span class="p">,</span> <span class="n">dim</span><span class="p">),</span> <span class="n">initializer</span><span class="o">=</span><span class="s1">&#39;glorot_uniform&#39;</span><span class="p">,</span> <span class="n">trainable</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s1">&#39;W2&#39;</span>
        <span class="p">)</span>
        <span class="bp">self</span><span class="o">.</span><span class="n">W3</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">add_weight</span><span class="p">(</span>
            <span class="n">shape</span><span class="o">=</span><span class="p">(</span><span class="n">dim</span><span class="p">,</span> <span class="n">dim</span><span class="p">),</span> <span class="n">initializer</span><span class="o">=</span><span class="s1">&#39;glorot_uniform&#39;</span><span class="p">,</span> <span class="n">trainable</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s1">&#39;W3&#39;</span>
        <span class="p">)</span>
        <span class="bp">self</span><span class="o">.</span><span class="n">b</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">add_weight</span><span class="p">(</span>
            <span class="n">shape</span><span class="o">=</span><span class="p">(</span><span class="n">dim</span><span class="p">,),</span> <span class="n">initializer</span><span class="o">=</span><span class="s1">&#39;zeros&#39;</span><span class="p">,</span> <span class="n">trainable</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s1">&#39;bias&#39;</span>
        <span class="p">)</span>
        <span class="nb">super</span><span class="p">(</span><span class="n">GatedFusion</span><span class="p">,</span> <span class="bp">self</span><span class="p">)</span><span class="o">.</span><span class="n">build</span><span class="p">(</span><span class="n">input_shape</span><span class="p">)</span>

    <span class="k">def</span><span class="w"> </span><span class="nf">call</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">inputs</span><span class="p">):</span>
        <span class="n">user_embedding</span><span class="p">,</span> <span class="n">short_term</span><span class="p">,</span> <span class="n">long_term</span> <span class="o">=</span> <span class="n">inputs</span>

        <span class="c1"># 计算门控向量：G = sigmoid(W1·e_u + W2·s_t + W3·p_u + b)</span>
        <span class="n">gate</span> <span class="o">=</span> <span class="n">tf</span><span class="o">.</span><span class="n">sigmoid</span><span class="p">(</span>
            <span class="n">tf</span><span class="o">.</span><span class="n">matmul</span><span class="p">(</span><span class="n">user_embedding</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">W1</span><span class="p">)</span> <span class="o">+</span>
            <span class="n">tf</span><span class="o">.</span><span class="n">matmul</span><span class="p">(</span><span class="n">short_term</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">W2</span><span class="p">)</span> <span class="o">+</span>
            <span class="n">tf</span><span class="o">.</span><span class="n">matmul</span><span class="p">(</span><span class="n">long_term</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">W3</span><span class="p">)</span> <span class="o">+</span>
            <span class="bp">self</span><span class="o">.</span><span class="n">b</span>
        <span class="p">)</span>  <span class="c1"># [batch_size, 1, dim]</span>

        <span class="c1"># 门控融合：o_t = (1 - G) ⊙ p_u + G ⊙ s_t</span>
        <span class="n">output</span> <span class="o">=</span> <span class="p">(</span><span class="mi">1</span> <span class="o">-</span> <span class="n">gate</span><span class="p">)</span> <span class="o">*</span> <span class="n">long_term</span> <span class="o">+</span> <span class="n">gate</span> <span class="o">*</span> <span class="n">short_term</span>

        <span class="k">return</span> <span class="n">output</span>
</pre></div>
</div>
<p>整个SDM模型的最终实现将三个模块串联起来：</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="c1"># 短期兴趣建模</span>
<span class="n">short_term_interest</span> <span class="o">=</span> <span class="n">build_short_term_interest</span><span class="p">(</span>
    <span class="n">short_history_item_emb</span><span class="p">,</span> <span class="n">user_embedding</span>
<span class="p">)</span>  <span class="c1"># [batch_size, 1, dim]</span>

<span class="c1"># 长期兴趣建模</span>
<span class="n">long_term_interest</span> <span class="o">=</span> <span class="n">build_long_term_interest</span><span class="p">(</span>
    <span class="n">long_history_features</span><span class="p">,</span> <span class="n">user_embedding</span>
<span class="p">)</span>  <span class="c1"># [batch_size, 1, dim]</span>

<span class="c1"># 门控融合</span>
<span class="n">gated_fusion</span> <span class="o">=</span> <span class="n">GatedFusion</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="s1">&#39;gated_fusion&#39;</span><span class="p">)</span>
<span class="n">final_interest</span> <span class="o">=</span> <span class="n">gated_fusion</span><span class="p">(</span>
    <span class="p">[</span><span class="n">user_embedding</span><span class="p">,</span> <span class="n">short_term_interest</span><span class="p">,</span> <span class="n">long_term_interest</span><span class="p">]</span>
<span class="p">)</span>  <span class="c1"># [batch_size, 1, dim]</span>
</pre></div>
</div>
<p><strong>训练和评估</strong></p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">run_experiment</span><span class="p">(</span><span class="s1">&#39;sdm&#39;</span><span class="p">)</span>
</pre></div>
</div>
<div class="output highlight-default notranslate"><div class="highlight"><pre><span></span><span class="o">+---------------+--------------+-----------+----------+----------------+---------------+</span>
<span class="o">|</span>   <span class="n">hit_rate</span><span class="o">@</span><span class="mi">10</span> <span class="o">|</span>   <span class="n">hit_rate</span><span class="o">@</span><span class="mi">5</span> <span class="o">|</span>   <span class="n">ndcg</span><span class="o">@</span><span class="mi">10</span> <span class="o">|</span>   <span class="n">ndcg</span><span class="o">@</span><span class="mi">5</span> <span class="o">|</span>   <span class="n">precision</span><span class="o">@</span><span class="mi">10</span> <span class="o">|</span>   <span class="n">precision</span><span class="o">@</span><span class="mi">5</span> <span class="o">|</span>
<span class="o">+===============+==============+===========+==========+================+===============+</span>
<span class="o">|</span>        <span class="mf">0.0058</span> <span class="o">|</span>       <span class="mf">0.0051</span> <span class="o">|</span>    <span class="mf">0.0046</span> <span class="o">|</span>   <span class="mf">0.0044</span> <span class="o">|</span>         <span class="mf">0.0006</span> <span class="o">|</span>         <span class="mf">0.001</span> <span class="o">|</span>
<span class="o">+---------------+--------------+-----------+----------+----------------+---------------+</span>
</pre></div>
</div>
</section>
</section>


        </div>
        <div class="side-doc-outline">
            <div class="side-doc-outline--content"> 
<div class="localtoc">
    <p class="caption">
      <span class="caption-text">Table Of Contents</span>
    </p>
    <ul>
<li><a class="reference internal" href="#">2.3.1. 深化用户兴趣表示</a><ul>
<li><a class="reference internal" href="#mind">2.3.1.1. MIND：用多个向量捕捉用户的多元兴趣</a></li>
<li><a class="reference internal" href="#sdm">2.3.1.2. SDM：融合长短期兴趣，捕捉动态变化</a></li>
</ul>
</li>
</ul>

</div>
            </div>
        </div>

      <div class="clearer"></div>
    </div><div class="pagenation">
     <a id="button-prev" href="index.html" class="mdl-button mdl-js-button mdl-js-ripple-effect mdl-button--colored" role="botton" accesskey="P">
         <i class="pagenation-arrow-L fas fa-arrow-left fa-lg"></i>
         <div class="pagenation-text">
            <span class="pagenation-direction">Previous</span>
            <div>2.3. 序列召回</div>
         </div>
     </a>
     <a id="button-next" href="2.generateive_recall.html" class="mdl-button mdl-js-button mdl-js-ripple-effect mdl-button--colored" role="botton" accesskey="N">
         <i class="pagenation-arrow-R fas fa-arrow-right fa-lg"></i>
        <div class="pagenation-text">
            <span class="pagenation-direction">Next</span>
            <div>2.3.2. 生成式召回方法</div>
        </div>
     </a>
  </div>
        
        </main>
    </div>
  </body>
</html>